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Abstract 

We propose Ordered Subspace Clustering (OSC) to segment data drawn from a 
sequentially ordered union of subspaces. Similar to Sparse Subspace Clustering 
(SSC) we formulate the problem as one of finding a sparse representation but 
include an additional penalty term to take care of sequential data. We test our 
method on data drawn from infrared hyper spectral, video and motion capture 
data. Experiments show that our method, OSC, outperforms the state of the 
art methods: Spatial Subspace Clustering (SpatSC), Low-Rank Representation 
(LRR) and SSC. 
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1. Introduction 

In many areas such as machine learning and image processing, high dimen¬ 
sional data are ubiquitous. This high dimensionality has adverse effects on the 
computation time and memory requirements of many algorithms. Fortunately, it 
has been shown that high dimensional data often lie in a space of much lower di¬ 
mension than the ambient space mm- This has motivated the creation of many 
dimension reduction techniques. These techniques, such as Principal Compo¬ 
nent Analysis (PCA), assume that the data belongs to a single low dimensional 
subspace [3]. However in reality the data often lies in a union of multiple sub¬ 
spaces. Therefore it is desirable to determine the subspaces in the data so that 
one can apply dimension reduction to each subspace separately. The problem 
of assigning data points to subspaces is known as subspace segmentation. 

Given a data matrix of N observed column-wise samples A = [ai, a 2 , ..., ajv] 
G , where D is the dimension of the data, the objective of subspace 

segmentation is to learn corresponding subspace labels 1 = ■ ■ ■ Jn] G N^. 

Data within A is assumed to be drawn from a union of k subspaces 
of dimensions Both the number of subspaces k and the dimension of 
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the subspace structure. 


(a) Observed data lies in 
disjoint sets of subspaces. 



(c) Final labels for each sample are obtained through spectral clustering on Z. 


Figure 1: An overview of the subspace clustering procedure. The observed 
data such as face images are assumed to lie in a union of lower dimensional 
subspaces. The self expressive property is then used to learn the coefhcients 
that best represent the subspace structure. Lastly spectral clustering is applied 
to the learnt coefficients, which are treated as similarities, to obtain the final 
subspace labels. 


each subspace are unknown. To further complicate the problem it is rarely the 
case that clean data A is observed. Instead we usually observe data which 
has been corrupted by noise. Subspace segmentation is a difficult task since one 
must produce accurate results quickly while contending with numerous unknown 
parameters and large volume of potentially noisy data. 

The use of subspace segmentation as a pre-processing method has not been 
limited to dimensionality reduction. For example it has been used in other 
applications such as image compression [1], image classification [3 [S], feature 
extraction BIHl, image segmentation iniin]. Furthermore state-of-the-art sub¬ 
space segmentation has shown impressive results for pure segmentation tasks 
such as identifying individual rigidly moving objects in video [ni [H [Ml E], 
identifying face images of a subject under varying illumination [HdS], segmen¬ 
tation of human activities and temporal video segmentation |18j . 

This paper is concerned with a variant of subspace segmentation in which 
the data has a sequential structure. The data is assumed to be sampled at 
uniform intervals in either space or time in a single direction. For example 
video data which as a function of time has a sequential structure [HI [19] where 
it is assumed that frames are similar to their consecutive frames (neighbours) 
until the scene ends. Another example is hyper-spectral drill core data [T], which 
is obtained by sampling the infrared reflectance along the length of the core. 
The mineralogy is typically stratified meaning segments of mineral compounds 
congregate together [20l El] . The sequential structure implies that consecutive 
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Figure 2: Example frames from Video 1 (see Section X for more details). Each 
frame is a data sample and each scene in the video corresponds to a subspace. 
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Figure 3: Three examples of human activities (walking, side-stepping and bal¬ 
ancing) from the HMD database. Each activity lies in it’s own subspace. The 
top row demonstrates the actor wearing the reflective marker suit and the bot¬ 
tom row shows the captured skeletal structure. 


data samples are likely to share the same subspace label i.e. h = h+i-, until of 
course a boundary point is reached. 

This papers main contribution is the proposal and discussion of the Ordered 
Subspace Clustering (OSC) method, which exploits the sequential structure of 
the data. Experimental evaluation demonstrates that OSC outperforms state- 
of-the-art subspace segmentation methods on both synthetic and real world 
datasets. A preliminary version of this paper was published in CVPR14 m- 
The optimisation scheme that was suggested in the preliminary version lacked 
a guarantee of convergence and suffered from huge computational cost. In this 
paper we provide two new optimisation schemes to solve the OSC objective, 
which have guaranteed convergence, much lower computational requirements 
and can be computed in parallel. Furthermore we perform experiments on new 
synthetic and real datasets. 

2. Prior and Related Work 

The state-of-the-art methods in subspace segmentation are the spectral sub¬ 
space segmentation methods such as Sparse Subspace Clustering (SSC) and 
Low-Rank Representation (LRR). Spectral subspace segmentation methods con¬ 
sist of two steps: 


3 
































1. Learn the subspace structure from the data 

2. Interpret the structure as an affinity matrix and segment via spectral 
clustering 

The main difference between spectral methods is in their approaches to learning 
the subspace structure. 

To learn the subspace structure of the data, spectral subspace segmentation 
methods exploit the the self expressive property [T]: 

each data point in a union of subspaces can be efficiently recon¬ 
structed by a combination of other points in the data. 

In other words a point in a subspace can only be represented by a linear 
combination of points from within the same subspace. Unless the subspaces 
intersect or overlapping, which is assumed to be extremely unlikely in practice. 
This leads to the following model 


ai = Azi (1) 

where Zi € is a vector of coefficients, which encode the subspace structure. 
Due to the self-expressive property the non-zero elements of will correspond 
to samples in A that are in the same subspace as sample i. Therefore learning 
the coefficient vectors for each data sample can reveal some of the underlying 
subspace structure. The model can be expressed for all data points as 

A = AZ 


where columns of Z = [zi, Z 2 ,..., zn] € . 

After learning Z the next step is to assign each data point a subspace label. 
The first step in this process is to build a symmetric affinity matrix. The affinity 
matrix is usually defined as 

W = |Zf + |Z| (2) 

where element Wij of W is interpreted as the affinity or similarity between data 
points i and j. Next this affinity matrix is used by a spectral clustering method 
for final segmentation. Normalised Cuts (NCut) is the de facto spectral 
clustering method for this task [U [23] . 

So far it has been assumed that the original and clean data A is observed. 
Unfortunately this ideal situation is rare with real world data. Instead the 
data is usually corrupted by noise in the data capture process or during data 
transmission of the data. Therefore most subspace clustering methods assume 
the following data generation model 

X = A-kN 

where A is the original data where each point (column) lies on a subspace and 
N is noise. N follows some probability distribution. Two common assumptions 
for N are Gaussian distribution and Laplacian distribution. 
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Since it may be difficult to isolate the original data A from the noise N, 
most subspace clustering methods actually address the issue of noise by allowing 
greater flexibility in the self-expressive model. The self-expressive model usually 
becomes 

X = XZ -I- E 

where E is a fitting error and is different from N. 

2.1. Sparse Subspace Clustering 

Sparse Subspace Clustering (SSC) was originally introduced by Elhamifar & 
Vidal [I][24]. SSC adopts concepts from the domain of sparse models, namely 
that 


there exists a sparse solution, z^, whose nonzero entries correspond 
to data points from the same subspace as . 

In the case where the observed data is noiseless, i.e. we have A, each data 
point lying in the di-dimensional subspace Si can be represented by di points. 
This corresponds to the sparse representation of points, ideally a sparse so¬ 
lution should only select coefficients belonging to the same subspace as each 
point. Furthermore the number of non-zero coefficients should correspond to 
the dimension of the underlying subspace. The sparsity goals of SSC could be 
achieved through a solution to the following 

nun ||Z||o, s.t. A = AZ, diag(Z) = 0 , (3) 

where || • ||o is called the £q norm and is defined the number of non-zero entries. 
The diagonal constraint is used to avoid the degenerate solution of expressing 
the point as a linear combination of itself. However this problem is intractable, 
instead the convex relaxation £i norm is used 

nun ||Z||i, s.t. A = AZ, diag(Z) = 0. (4) 

The II • 111 is the £i norm and is defined as \^ij\ i-®- the sum of 

absolute values of the entries. We call this heuristic SSC. 

To overcome the simultaneous presence of noise and outliers, Elhamifar & 
Vidal [21] devised the following alternative 

min ^||E||^ + A 2 ||S||i + ||Z||i (5) 

s.t. X = XZ + E -I- S, diag(Z) = 0 

where || • ||f is the Frobenius norm and S is high magnitude sparse fitting error. 
This model allows for flexibility in the fitting error since setting either Ai or A 2 
to 0 eliminates E or S from the model. This only compensates for fitting errors 
however shows surprising robustness in practice. 

Recent work by Soltanolkotabi, Elhamifar and Candes |25j showed that un¬ 
der rather broad conditions using noisy data X the £i approach should produce 
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accurate clustering results. These conditions include maximum signal-to-noise 
ratio, number of samples in each cluster and distance between subspaces and 
appropriate selection of parameters. They use the following relaxed objective 

min i||xi - Xzilll + Ai||zi||i, s.t. diag(Z) = 0. (6) 

Zi z 

with regularisation parameter Ai tuned for each data sample. 

In practice SSC allows for efficient computation. Each column of Z can be 
computed independently and in parallel with the other columns. In contrast 
to LRR (discussed next) the computational requirements are lightweight. Fur¬ 
thermore Z is sparse, which can reduce memory requirements and decrease time 
computational requirements and time spent during the final spectral clustering 
step. 

2.2. Low-Rank Subspace Clustering 

Rather than compute the sparsest representation of each data point indi¬ 
vidually, Low-Rank Representation (LRR) by Liu, Lin and Yu [23] attempts to 
incorporate global structure of the data by computing the lowest-rank represen¬ 
tation of the set of data points. Therefore the objective becomes 

min rank(Z), s.t. A = AZ. (7) 

This means that not only can the data points be decomposed as a linear combi¬ 
nation of other points but the entire coefficient matrix should be low-rank. The 
aim of the rank penalty is to create a global grouping effect that reflects the un¬ 
derlying subspace structure of the data. In other words, data points belonging 
to the same subspace should have similar coefficient patterns. 

Similar to SSC, the original objective for LRR is intractable. Instead the au¬ 
thors of LRR suggest a heuristic version which uses the closest convex envelope 
of the rank operator: the nuclear or trace norm. The objective then becomes 

min ||Z||», s.t. A = AZ (8) 

where || • ||* is the nuclear norm and is the sum of the singular values. The 
singular values can be computed through the Singular Value Decomposition 
(SVD). 

LRR has achieved a lot of attention in the subspace segmentation commu¬ 
nity, which had led to some interesting discoveries. The most surprising of which 
is that there is a closed form solution to the heuristic noiseless LRR objective. 
The closed form solution is given by the Shape Interaction Matrix (SIM) and is 
defined as 

Z = VaVX 

where Va are the right singular vectors given by the SVD of A. 

In the case where noise is present, the authors of LRR suggested a similar 
model to that used in SSC. However they assume that their fitting error will 
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be only be present in a small number of columns. This results in the following 
objective 

min A||E||i ,2 + ||Z|U, s.t. X = XZ + E, (9) 

E.Z 

where ||E||i ,2 = Z]"=i ll^ilU is the £ 1,2 norm. 

Even though LRR has shown impressive accuracy performance in many sub¬ 
space segmentation tasks it has two drawbacks: 

• high computational cost, 

• large memory requirements. 

LRR’s high computational cost comes from the required computation of the 
SVD of Z at every iteration. Depending on the convergence tolerance LRR may 
iterate hundreds or even thousands of times. However some improvements have 
been made by computing partial or skinny SVD approximations. Similarly the 
large memory requirements of LRR stem from the computation of the SVD of 
Z. Since the number of elements in Z scales quadratically with the number of 
number of data samples it may not be possible to apply LRR even for modest 
datasets. Work has been done in fast approximations of SVD El 127] but it has 
not yet been applied to LRR at the time of writing. 


3.3. Regularised Variants 

Laplacian Regularised LRR [28] and LRR with Local Constraint [29] incor¬ 
porate Laplacian regularisation to ensure that data points close in the ambient 
space share similar coefficient structure. The objectives for both approaches can 
be generalised to 

N 

mm/(Z)-f -z,|| 2 , s.t. X = XZ-P E, (10) 

i 

where /(Z) is a placeholder for a fitting term and other regularisation such as 
nuclear norm or £1 on Z and Wij is a weight based on distance between sample 
i and j. 

Spatial Subspace Clustering (SpatSC) [20] extended SSC by incorporating a 
sequential £1 neighbour penalty 

- ||E|^-|-Ai||Z||i-|- A 2 IIZRII 1 , (11) 

Z,E 

s.t. X = XZ -I- E, diag(Z) = 0, 


where R is a lower triangular matrix with —1 on the diagonal and 1 on the 
second lower diagonal: 


R e 


-1 

1 -1 

1 -1 


( 12 ) 


1 -1 
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Therefore ZR = [z 2 — zi,Z 3 — Z 2 , ...,zjv — zjv_i]. The aim of this formulation 
is to force consecutive columns of Z to be similar. 


3. Ordered Subspace Clustering 

The assumption for Ordered Subspace Clustering (OSC) [19] is that the 
data is sequentially structured. Since physically neighbouring data samples are 
extremely likely to lie in the same subspace they should have similar coefficient 
patterns. Consider a a video sequence from a television show or movie. The 
frames are sequentially ordered and each scene lies on a subspace. Since the 
scene changes are relatively rare compared to the high frame rate it is extremely 
likely that consecutive frames are from the same subspace. In other words the 
columns of Z should follow the rule z^ « z^+i. 

Similar to SpatSC, OSC extends SSC with an additional regularisation penalty. 
The objective is as follows: 

min^||E|l^ + Ai|lZ||i+A2||ZR|li,2,s.t. X = XZ + E. (13) 

Z.E 


where R is defined as in (12). 

Instead of the norm over ZR as used in SpatSC, OSC uses the £ 1,2 norm 
to enforce column similarity of Z. In contrast to SpatSC this objective much 
more strictly enforces column similarity in Z. ||ZR||i only imposes sparsity 
at the element level in the column differences z^ — Zi_i and does not directly 
penalise whole column similarity. Therefore it allows the support (non-zero 
entries) of each consecutive column to vary. In effect this allows some values 
in consecutive columns to be vastly different. This does not meet the stated 
objective of z^ « z^+i. 

Thus in (13), the weak penalty ||ZR||i from SpatSC has been replaced with 
the stronger penalty ||ZR||i ^2 to strictly enforce column similarity. We remove 
the diagonal constraint as it is no longer required in most cases and can interfere 
with column similarity. However we discuss in the following section how to 
include the constraint if it is required. 


4. Solving the Objective Function 


In the preliminary version of this paper 


a procedure to solve the relaxed 
version of the objective (131 was discussed. However the former procedure 
lacked a guarantee of convergence. Furthermore the procedure suffered from 
huge computational complexity due to the expensive Sylvester equation [3(11 [31] 
required. 

In this paper two new procedures are discussed for relaxed and exact vari¬ 
ants, both of which have guaranteed convergence and reduced computational 
complexity. For a demonstration of the speed improvements please see Sec¬ 
tion and Figure There improvements have been achieved through adop¬ 
tion of the LADMAP (Linearized Alternating Direction Method with Adaptive 





Penalty) [32], [33] and LADMPSAP (Linearized Alternating Direction Method 
with Parallel Spliting and Adaptive Penalty) |31j frameworks. Overviews of the 
procedures can be found in Algorithms and 


4.I. Relaxed Constraints 


The relaxed variant of (13) can be written as: 


min- IIX 
z.j 2 


XZ||| + Ai||Z||i + A2||J||i,2, 


s.t. J = ZR. 


(14) 


Then the Augmented Lagrangian for the introduced auxiliary variable con¬ 
straint is, 


1 , 


/:(Z,J,Y|/r)=A|lX-XZ"2 


F + All 


(Y,J-ZR) + ^||J-ZR||f 


1 + A 2 II J||l,2 

2 


(15) 


Objective (151 will be solved for Z and J in a sequential and alternative 
manner when fixing the other, respectively. Given the solution state Z*^, J^, Y^ 
and adaptive constant the procedure for k = 1,2,... is as follows: 

1. Update Z^+^ by solving the following subproblem 


Z'^+i = argmin£(Z, J^ Y'^l/). 


(16) 


which is equivalent to 

Z'^+i =argminAi||Z||i + i||X- XZ|1^ (17) 

z ^ 

-f (Y'=, - ZR) -I- y IIJ'^ - ZRIII. 

There is no closed form solution to the above problem because of the 
coefficient matrices X and R on Z. Thus linearisation over the last three 
terms is used. Denote g{Z) = 5 IIX — XZ|||. and h{Z) — (Y^, J'"' — ZR) + 

k 

^||J^ — ZR|||. which is from the augmented Langrangian. The linear 
approximation at Z^ |3S| for g{Z) and h{Z) respectively, is 


5(Z)^ 

a (V 5 (Z'=),Z-Z'=) 

^^\\z-z^\\l 

(18) 

h(Z)R 

:: {\Ih{Z^),Z-Z^) 

+ ^||Z-Z'=|!|. 

(19) 


where and 

Vg(Z'=) = -X^(X- XZ^), 

V/i(Z'=) = -(Y'^ -f /(J'^ - Z'=R))R'^. 
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Denote 


1 


yfe = 


[x'^'(x-xz*) + y'=R'^i 


O-J +L, 


where 


= r'= +^''(J'= - Z'^'R) 


then problem (16) can be approximated by the following problem 

crj + ^2 


Z^'*'^ = argmin Ai|jZ||i 

z 


Z-V" 


( 20 ) 


Problem (20) is separable at element level and each has a closed-form 
solution defined by the soft thresholding operator, see [SSlISZj, as follows 

Ai 


Z'^+i = sign (V'=) max f |V'=| - ) 


( 21 ) 


2. Given the new value Z^+^ from last step, is updated by solving 
j'^+i = argmin£(Z'=+\ J, Y'^l/') 

j 


= argminA 2 ||J||i ,2 + (Y^ J - Z^^+^R) + ^||J - Z'^+^RH^. 

j ^ 

The linear term is easily absorbed into the quadratic term such that a 
solvable problem can be achieved as follows. 


minA 2 ||J||i ,2 + ^||J - Z^+^R+^Y 


1 


rk\\2 


( 22 ) 


where aj = with a constant r]j > iQ Denote by = Z^+^R — 
^Y^, then the above problem has a closed-form solution defined as fol¬ 
lows. 


J. 



if |lu,|| > ^ 

^ J 

otherwise 


(23) 


where J,- and Ui are the i-th columns of and U^, respectively. Please 
refer to [23]. 

3. Update Y'^+i by 


Y'^+i = -h/(J'=+i - Z'^+iR) (24) 


^Ideally rjj = 1. For the purposes of convergence analysis in Section 5.4 r)j is set to larger 
than 1. 
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4. Update adaptive constant by 

= min(/ii„axi,7/^^) 

The entire procedure for solving the relaxed OSC objective is summarized in 
Algorithm 1. This set of updating rules is a generalisation of those in LADMAP 
[HH] . as such it will be referred to as v-LADMAP for short. Note however, that 
in the original LADMAP the linearisation is performed only on the augmented 
Lagrange term, i.e. on /i(Z), based on which the convergence analysis is carried 
out. Whereas in the v-LADMAP case, both h{Z) and g{Z) are linearised in 
order to obtain a closed-form solution to Z. This difference means that the 
convergence analysis in LADMAP is no longer applicable here. As such, detailed 


analysis on the convergence of v-LADMAP is provided in Section 4.4 


4-2. Exact Constraints 

Similar to the relaxed version, auxiliary constraint variables are introduced 
_ -I 

Z,E,J 2 

s.t. X = XZ + E,J = ZR 


min -||E||| + Ai|!Z||i + A2||J||i.2 


(25) 


Then the Augmented Lagrangian form is used to incorporate the constraints 

/:(e,z,j,Yi,Y2|m) 

= i||E||2 +Ai||Z||i + A2||J||i,2 


-H (Yi,XZ-X-HE) -K ^llXZ-X-hE 


Ml 


-f (Y2,J-ZR) + ^||J-ZR 


(26) 


In problem (26), there are three primary variables Z, E and J, so a sim¬ 


ple linearised ADM as used in the previous subsection may diverge in the 
multi-variable case as demonstrated in [33]. To overcome this the so-called 
Linearized Alternating Direction Method with Parallel Splitting and Adaptive 


Penalty method (LADMPSAP) is adopted, which for problem (26), consists of 
the following steps, see [34] : 

1. Update Z'^+i 

,k 


mmAi||Z||i -f (Yf,XZ-X- 


E'=) -h — IIXZ-X- 




+ (Y2^J'=-ZR) + ^|1J'=-ZR||| 


Define 


( 27 ) 


F(Z) =(Yf,XZ-X-hE'=) -H ^||XZ-X 


-E'^lll 


(Y2^J^■-ZR)-f ^||J^--ZR|||. 
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By linearizing F, (27) can be approximated with the following proximal 
problem 


CTr II „ /ryfc 1 


lAi|1Z||i + ^||Z-(Z'=--VF(Z 


fc^M|2 
F 


where = ^JL^r|z {rjz is an appropriate constant) and 

VF(Z'=) =X^{Y'l + /(XZfc - X + E'=)) - (Y^ + - Z'=R))R'^. 

As discussed before the solution is given by the soft thresholding operator 
dehned in (|2^ with V = (Z^' - ^VF(Z'=)). 

2. Update by 




mm-llElP ' v-7fc 


+ (Yf,XZ'= -X + E) 


IXZ^'-X + EII 


This is a least square problem whose solution can be given by 


E^+i _ 


XZ'^ -X 


\rk 
^ ^ 1 


lu + 1 


(28) 


3. Update by 

minA2||J|ll,2 + (Y2^J-Z'=R) 


IIJ - Z'=R|1 


The solution can be obtained by using (23) with Z^+^ replaced by Z^. 


4. Update multipliers with the new values of primary variables by 
Yfe+i ^Y'l + /(XZ'=+^ - X + E'=+1) 

fc + 1 _-XT-fc 


Y 


=Y| + - Z'^+iR) 


5. Update 


M = min(/rinaxi, 'y^J■ ) 

The entire procedure is summarised in Algorithm 
4-3. Diagonal Constraint 

In some cases, it may be desirable to enforce the constraint diag(Z) = 0 i.e. 
we should not allow each data point to be represented by itself. The objective 
becomes 


min -||E|||i + Ai||Z||i + A2 ||ZR||i_ 2 
s.t. X = XZ + E, diag(Z) = 0 


(29) 
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Algorithm 1 Solving (14) by v-LADMAP 


Require: - observed data, Ai, A 2 - regularisation parameters, /i, /r” 

^ - rate of descent parameters and € 1,62 > 0 . 

Initialise J'= = Q^xN-i^ ^ qNxn 

while not converged do 
Find by using 

Find by using (23) 


» 


Check stopping criteria 


||jfe+i _ 2fc+iR||^ < 


/max(|jZ"-+i - Z>^\\f, ||J'=+' - J1|f) < £2 
6: +/(J'=+1 - Z'^+iR) 

7: Update 7 

fyO if /i'=max{||Z'=+i-Z'^II f, 

7=< IIJ'^+i-J'=|! f} <£2 

I 1 otherwise, 

8: = min(/rmaxi,7M'') 

9: end while 
10: return Z 


To enforce such a constraint it is not necessary to significantly alter the afore¬ 
mentioned optimisation schemes. This constraint only affects the step involving 
Z. Since this step is the soft shrinkage operator and is separable at the element 
level one can simply set the diagonal entries to 0 afterwards. In other words the 
Z update solution (21) becomes 


Zij — 


0 

sign {V,j 


max 


^ul- 


if i = j 
otherwise. 


(30) 


4-4- Convergence Analysis for Algorithms 

LADMPSAP adopts a special strategy that updates all primary variables in 
parallel using their values from the last iteration. See Step 2 to Step 11 and 
the equations they refer to. The LADMPSAP algorithm for problem ( [2^ is 
guaranteed to converge. For the convergence analysis, please refer to |34j. The 
convergence theorem is repeated here with some modifications reflecting the 
settings in our problem. 


Theorem 1. If fjf is non-decreasing and upper bounded, 

then the sequence {(Z^', J^, Yf, Yf)} generated by Algorithm 2 converges to 

a KKT point of problem (25). 


Differently in v-LADMAP, which is used to solve the relaxed objective, up¬ 
dating the primary variables is performed in sequence. Meaning that one up- 
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Algorithm 2 Solving (25) by LADMPSAP 


Require: X 


DxN 


^max » p > 


observed data, Ai, A 2 - regularisation parameters, /r, 
||X|P - rate of descent parameters and ei, £2 > 0. 


Initialise S = U = SR, Yi = 

while not converged do 
Find by using (f30) 

Find by using (28) 


Yo = 1 


NxN-l 


0 


NxN 


Find by using (23) with replaced by Z^ 


Check stopping criteria 


\\XZ^+^ -X + E'^+^Wf 

||jk+i _ zk+ijj^ii^ 

|^max{|iZ'=+i - Z>^\\f, HE'^+i - E'^H, 

||jfc+i _ ||Z'=+iR- Z'^RIIj^)} < £2 


7 : Y^+i =Yf+^J(XZ‘^+i-X + E'=+i) 

8: Y^+1 = Yf+^^(J'=+1-S'=+1R) 

9 : Update 7 

'70 if |^max{||Zfc+i-Z'=||^, 

I ||E'=+i-E'=||,||J'=+i-J'=||^, 

||Z'=+iR-Z'=R||f} <£2 
1 otherwise, 

10: = min(^maxi,7M*) 

11: end while 
12: return Z 
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dated primary variable is used immediately to update another primary variable 
so that the optimisation is carried out by alternating directions sequentially. In 
Step 2 in Algorithm 1, the updated value of is used to obtain The 

proof of convergence for LADMAP does not completely extend to v-LADMAP 
and since variables are updated sequentially the convergence from LADMPSAP 
does not apply either. As such the convergence theorem for v-LADMAP is 
presented in the remainder of this section. 

Consider the original relaxed constrained version (14). The KKT conditions 
of problem (14) lead to the following: there exists a triplet (Z*, J*,Y*) such 
that 


J* = Z*R; -Y* G A29||J*||i,2 (31) 

X'^(X-XZ*)+Y*R^ G Aia||Z*||i. (32) 

where d denotes the subdifferential. 

Lemma 1. The following relations hold 

T% =-{a'f + L,){Z'^+^-Z^)+Y'^'R^ (33) 

-I- X^X(Z'=+i - Z'=) G Vg(Z'=+i) -t Aia||Z'=+i||i 

- J'^) - y'^' g Aa^ll j''+^|ii,2, (34) 

where 

yfc = - 1 - /(J'^ - Z'^R) (35) 

+ - Z'^+iR) (36) 


Proof. Checking the optimality conditions of two subproblems (20) and (22) for 
Z^+^ and leads to the above claims. □ 


Lemma 2. For the sequence generated hy Algorithm [7] the following identity 
holds 


iv. + Lz(/)-')||Z'=+i - Z*||| - ||(Z'=+i - Z*)R||| 

+ - rill + (/)-||Y'=+i - Y*||| 

=(r,, + L,(/)-i)||Z^ - Z*||| - ||(Z'= - Z*)R||| 

+ ,?^||j'=-r||| + (/i'=)-||Y^--Y*||| (37) 

- {(/)-||Y'=+i - Y'^lll + 77j|lJ'=+' - J'^lll 

-2(/)-i(Y'=+i-Y^J'=+l-J'^)} (38) 

- ((r,, + L,(/)-i)||Z'=+i - Z'=||| - IKZ'^+i - Z'=)R|||) (39) 

-2(^'=)-i(Z'=+1-Z*,T|-Y*R^) (40) 

- 2(/)-^(r+i - J*,T^Y*) (41) 

-f 2(/i'=)-i(Z'=+i - Z*,X^X(Z'=+^ - Z'^)) (42) 
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where L^, rjz and rjj are the constants used in linearisation (181, (191 and (22), 
respectively. 

Proof. This identity can be checked by using the definition of Y^' and Y^', see 
(35) and (361, and using the following identities 

J* = Z*R; 

2(a - b, a - c) = lla - blP - lib - clP + lla - clP, 


as well as the updating rule for Y^+^, see (24). Since the full proof is lengthy 
and tedious it is omitted here. □ 

Before the most important lemma it is necessary to introduce the following 
inequalities. 

Lemma 3. The following inequalities hold with r]j > 1 

D’^ = (/)-"||Y'=+i - Y'^lll, + r]j\\3^+^ - JYf 

-2(/)-1(Y'=+i - Y*^,j'=+i - > 0 (43) 

(Z'=+1-Z*,T|-Y*R^) >0 (44) 

- J*,T^ + Y*) > 0. (45) 


Proof. (43) is due to Cauchy inequality. (44) and (45) are the results of com¬ 
bining the convexity of the objective functions (20) and (22), which are used to 
update Z^+^ and respectively, with the following inequality 

{x -y,Px- Py) > 0 , Vpx G df{x) and py G df{y) 

where /(x) is any convex function. □ 

Next the most important lemma is presented. 

Lemma 4. ///r^ is increasing, > ||R-|p 7 Vj > — > Li/(? 7 z — ||R|p) 

and (Z*, J*, Y*) is any KKT point of problem (14), then the sequence generated 
by Algorithm 1 satisfies 

1. 5^= 4 (r?. + L,(/)-i)||Z'= - Z*||2, - IKZ'^ - Z*)R|||, + yj\\J^ - J*||| -f 
(/r*^)-2||Y'= - Y*|||. is nonnegative and nonincreasing; 

2. IIZ'^+i - Z'^IIf ^ 0 , IIJ'^+i - J'^IIf ^ 0 , and HY'^+i - Y'^Hf ^ 0 . 

Proof. For claim 1) note that 

2(z'=+i - Z*, X^X(Z'=+i - Z*^)) 

<211X^X11 ||Z'=+i - Z'=||||Z''+^ - Z*|| 


<L, 


T 


k+l 


^k+1 _ ^k 


Zfc-hi _ 2 . 


k\\2 


+ 


/+! - / 


Z'=+1 - z 


*\\2 


where we have chosen L, = ||Xp. 
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Then from Lemma and Lemma and noting is increasing, we have the 
following 


gfc+i -D'^ + IjZ'^+i - Z'=|||||R||^ - {r,, - 




^k+l _ ' 


fc+1 tyk II 2 


|Z'=+-^ - z 


(46) 


Now it is easy to check that when 

^k+i _ > 




(47) 


the sum of last two terms in (461 is nonnegative. Hence the claim 1) has been 
proved. 

Regarding claim 2), is non-increasing and nonnegative, thus it must have 
a limit, denoted by s°°. If we take the sum over (461 for all the iterations k, we 
have 




.+1 - IIi^IIf) 




k^l 


+00 


!Zfc-n_zfe||^<si-s~ 


Hence 'I2k^i bounded, under the condition 

([47|). This gives 

IIZ'^+i - ^ 0 and D’^ 


0 . 


It is easy to check that 

Dy > (7?j-l)||J'=+^- 


which means — J^||f 0 . 

By using the similar strategy, we have 




rfe II \2 


> ((/)-1|Y'=+l - Y'^ll - - J 

Hence we have || Y^'+^ — Y^jj^ —>■ 0. This completes the proof for claim 2). □ 

Theorem 2. Under the conditions of Lemma 0 the sequence {(Z*^, J^, Y^)} 
generated by Algorithm 1 converges to a KKT point of problem (14). 

Proof. By claim 2) of Lemmawe know that the sequence {(Z^, J^, Y^)} is 
bounded, hence it has an accumulation point, denoted by 

(Zfc., jfc.,Y'=") ^ (Z°°, J°°,Y°°). 


First we prove that (Z°°, J°°, Y°°) is a KKT point of problem (14). Accord¬ 
ing to update rule (24), we have 

jfc+i _ 2fe+ij^ ^ - Y^) 0. 


This shows J°° = Z°°R, i.e., any accumulation point is a feasible solution. 
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Taking /c = fc„ — 1 in Lemma and using the definition of subdifferential, 
we have 

i||X-XZ'="||| + Ai|iZ'="||i + A 2 ||J'=’'||i ,2 

<i||X-XZ*||| + Ai||Z*||i + A2||r||i.2 
+ (Z'=’' - Z*, + L.)(Z'=" - + X^X(Z'=" - Z'='*-i)) 

+ (jfc. _ j*, 

Let n —)■ + 00 , by using the fact in Lemma we can see 

i||X-XZ-||| + Ai|lZ-|!i + A 2 ||J°“|!i .2 
<i||X-XZ*||| + Ai|iZ*||i + A2||r||i.2 

(ZOO _ z*,Y°°R^) + (J°° -r,-Y°°) 

=i||X-XZ*||| + Ai|lZ*||i + A2||r||i,2. 


because {Z°°,3°°) is a feasible solution. So we conclude that (Z°°, J°°) is actu¬ 
ally an optimal solution to ( |l4| . 

In a similar way, from Lemma[^ we can also prove that — Y°° G A 29 |j J°°||i _2 
andX^(X-XZ~)+Y°°R^ G Aia||Z~||i. Therefore, (Z°°, J“, Y~) is a KKT 


point of problem (14). 

Taking(Z*,J%Y*) = (Z 
Z^Wl - \\(Z^- - Z-)R||| - 


Vj 


°,Y° 

U'^" - J 


) in LemmaH we have (? 7 ^+Lz(^*") ^)||1 
- J“lli + (/?")-2|1Y'=- - Y°°\\l -G 0. 


Z'^"- 


_. \Z^-Z° 

IlYfe _ y“|| 2 . ^ 0 for all k - 


2 


+ 00 . Hence 


With claim 1) of Lemma we have (% + Lz{^^) 

Z-)R||| + r,j\\3'^ - J~||| + " 

(Z^J^Y*^) = (Z“, J°°,Y°°). 

As (Z°°, J°°, Y°°) can be any accumulation point of {(Z*^, J^, Y^)}, we con¬ 
clude that {(Z^', J^, Y^)} converges to a KKT point of problem (14). This 
completes the proof of the theorem. □ 


5. Segmentation 


Once a solution to (13) has been found, the next step is to use the information 


encoded in Z to produce subspace labels for each data point. In the simple case 
that Z is strictly block-diagonal, one can use the non-zero columns of matrix ZR 
to identify the change from one block (subspace) to another since ZR = [z 2 — 
zi, Z 3 — Z 2 ,..., zjv — zjv-i]- By strictly block-diagonal matrix we mean a matrix 
with columns already ordered so that its appearance is block-diagonal. We 
distinguish strictly block-diagonal with general block-diagonal matrices which 
can be strictly block-diagonal once reordered. In this field general block-diagonal 
matrices refers to matrices where the non-zero entries link each data point to 
every other data point in the same subspace |38j . However obtaining a strictly 
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block-diagonal matrix is rarely seen in practice due to noisy data and it assumes 
that the subspace will only occur once in the sequence. 

The case of strictly block-diagonal W (as defined in (§) is a special case 
of the more general unique and connected subspace assumption. Under this 
assumption we know that once a subspace stops occurring in the sequence of data 
it will never occur again. In practice it is unlikely that W will be exactly block- 
diagonal but we often assume that subspaces will be unique and connected. 
When Z is not block-diagonal there will be a large number of non-zero columns 
of ZR, therefore we cannot use the earlier described method. Instead one can 
apply some minor post processing to ZR to find the boundaries of subspaces. 
First let B = |ZR| be the absolute value matrix of ZR. Then let b be the 
vector of column-wise means of B. Then we employ a peak finding algorithm 
over b, where the peaks are likely to correspond to the boundaries of subspaces. 

A more robust approach is to use use spectral clustering. The matrix Z is 
used to build an affinity matrix of an undirected graph. The affinity matrix 
or similarity graph is defined as W = |Z| -|- |Z|^. Element Wij corresponds to 
the edge weight or affinity between vertices (data points) i and j. Then we use 
the spectral clustering technique, Normalised Cuts (NCUT) [22], to obtain final 
segmentation. NCUT has been shown to be robust in subspace segmentation 
tasks and is considered state of the art [U [23] . In cases where Z is not block- 
diagonal or contains significant noise, NCUT will provide better segmentation 
than other spectral clustering methods. Algorithm summarises the entire 
proposed algorithm. 


Algorithm 3 Ordered Subspace Clustering Procedure 
Require: - observed data 

1: Obtain the sparse coefficients Z by solving the relaxed or exact objective 
2: Form the similarity graph W = |Z| -|- |Z|^ 

3: Estimate the number of subspaces k from W 

4: Apply spectral clustering to W to partition the data into k subspaces 
5: return Subspaces 


5.1. Estimating the number of subspaces 

Spectral clustering techniques require the number of clusters to be declared 
before hand. In many cases the number of subspaces in the data is unknown. 
Fortunately the number of clusters can be estimated from the affinity matrix 
W. Here we suggest some possible estimation techniques. 

For general unordered block-diagonal Z or where the subspaces may be reoc¬ 
curring in the sequence the number of clusters can be obtained from the singular 
values of W [35]. Specifically the number of non-zero singular values of Z or the 
number of zero singular values from the Laplacian matrix L corresponds to the 
number of blocks or subspaces. The Laplacian matrix is defined as L = D — W 
and D is a diagonal matrix where Du = ^ij ■ Prior work has suggested 
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using the normalised Laplacian L„ = I —however we noted no 
structural difference between L and L„. 

Singular values can be used even when the matrix W is not block diagonal 
due to noise. In this setting noise refers to non-zero entries of W that link data 
points belonging to different subspaces. However the raw values require some 
processing since there will be a large number of non-zero singular values. One 
can threshold these values as suggested in [55]. In other words any singular 
value less than a given value should be ignored. We can express this as 

N 

i=l 


where t is the threshold value. 

Singular value thresholding can produce acceptable results but it requires 
user selection of the threshold value. A more automatic approach is to use either 
the Eigen-gap |2] or the closely related SVD-gap [39] heuristic. The Eigen-gap 
heuristic uses the eigenvalues of W or L to find the number subspaces by finding 
the largest gap between the ordered eigenvalues. Let be the descending 

sorted eigenvalues of W such that 5i > S 2 > ■ ■ ■ > Sn- Then k can be estimated 
by 

k = argmax (Si — ^i+i) 

The SVD-gap heuristic is the same procedure with eigenvalues of W replaced 
with singular values [55] . 

6. Experimental Evaluation 

In this section the performance of OSC is compared against SSC, LRR and 
SpatSC methods with a variety of data sources. Parameters were fixed for each 
experiment. In order to evaluate performance consistently NCUT was used for 
final segmentation for every method in every experiment. MATLAB imple¬ 
mentations were used from the open source SubKil|^ package. Implementations 
for relaxed and exact variants of OSC have been included in this library. The 
relaxed v-LADMAP variant of OSC was used in all experiments. 

We use the subspace clustering error metric from [T] to evaluate clustering 
accuracy. The subspace clustering error (SCE) is as follows 

num. misclassified points , . 

SCE =-----^- (48) 

total num. of points 

Furthermore additional noise is injected into the data to test robustness. We 
report the level of noise using Peak Signal-to-Noise Ratio (PSNR) which is 


^https://github.com/sjtrny/SubKit 
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Experiment 

Ai A 2 diag(Z) = 0 

OSC 

Synthetic and Semi-Synthetic 

SSC 

0.1 1 1 0 

0.1 0.01 1 1 

0.4 0 

0.2 1 

OSC 

Video and Activity Segmentation ^^RR^ 

SSC 

0.1 1 1 0 

0.1 0.01 0.1 1 

0.4 0 

0.1 1 


Table 1: Overview of parameters used for each experiment. 


defined as 


PSNR= lOlogio 


( _ ^ 

y mn 



(49) 


where X = A + N is the noisy data and s is the maximum possible value of an 
element of A. Decreasing values of PSNR indicate increasing amounts of noise. 

In contrast to other works mug the minimum, maximum, median and mean 
statistics on clustering error are provided for the comparative experiments. It is 
important to consider these ranges holistically when evaluating these methods. 
In all experiments Gaussian noise was used with zero mean and unit variance. 
When parameters are fixed we report them in Tabled 


7. Synthetic Subspace Segmentation 


In this section evaluation is performed using randomly generated subspace 
structured data. Similar to [53] 5 subspaces are constructed whose bases 

{UJLi are computed by Ui+i = TUi, 1 < J < 4, where T is a random rotation 
matrix and Ui is a random orthonormal basis of dimension 100 x 4. In other 
words each basis is a random rotation away from the previous basis and the 
dimension of each subspace is 4. 20 data points are sampled from each subspace 
by X, = U,Q, where G is a random gaussian multi-variate matrix 

with row-wise variance of 0.001 and 0.0005 between neighbouring columns i.e. 
the following covariance matrix: 


C e ]R20 x20 


0.001 

0.0005 


0.0005 

0.001 

0.0005 


0.0005 

0.001 


0.0005 


0.0005 
0.0005 0.001 


This mimics the assumption that consecutive data points within the same 
subspace are similar to each other. Finally the data is concatenated X = 
[Xi,X2,...,X5]. 
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Figure 4: Results for the synthetic data segmentation experiment with various 
magnitudes of Gaussian noise. OSC (ours) outperforms SpatSC, LRR and SSC 
in the majority of cases. 



Figure 5: Average running time of OSC implementations for increasing amounts 
of data. The v-LADMAP procedure is a significant improvement over the pro¬ 
cedure (CVPR14) suggested in the preliminary version of this paper. 


We repeated the experiment 50 times with new random bases and coefficient 
matrices each time. Furthermore we repeated the experiment with various levels 
of noise to determine robustness. Results are reported in Figure]^ OSC (ours) 
demonstrated significantly better clustering accuracy than SpatSC, LRR and 
SSC in all metrics. Even in cases of extremely noisy data (low PSNR) OSC still 
demonstrates excellent accuracy. 

8. Running Time 

To demonstrate the improvements in running time we compare the the opti¬ 
misation scheme suggested in the preliminary version [19j of this paper and the 
v-LADMAP procedure suggested in this paper. Synthetic data was generated 


22 































Figure 6: Results for the semi-synthetic data segmentation experiment with 
various magnitudes of Gaussian noise. OSC (ours) outperforms SpatSC, LRR 
and SSC in the majority of cases. 


as in the previous section. However this time the number of samples in each of 
the 5 clusters is progressively increased. At each increase in the number of sam¬ 
ples we repeated both OSC procedures 10 times to obtain an average running 
time. The experiment was performed on a machine with a 3.4 Ghz i7 CPU and 
32GB RAM. Results are reported in Figure From the Figure we can clearly 
see that v-LADMAP provides a significant improvement over the CVPR14 pro¬ 
cedure. The v-LADMAP procedure completes the experiments in a matter of 
seconds where the CVPR14 procedure takes over 10 minutes. The CVPR14 
procedure results exhibits a quadratic run time in the number of samples while 
v-LADMAP is linear. 


9. Semi-Synthetic Experiment 

Semi-Synthetic data is assembled from a library of pure infrared hyper spec¬ 
tral mineral data as in |20j . Similar to the synthetic experiment 5 subspaces are 
created with 20 data samples in each. For each subspace 5 spectra samples are 
randomly chosen as the bases such that Ui S The 20 data samples are 

then sampled from each subspace by where Qi € is a random 

gaussian multi-variate matrix as defined in the previous section. The data is 
concatenated X = [Xi, X 2 ,..., X 5 ]. 

Similar to the previous experiment we repeated the experiment 50 times 
with new random bases and coefficient matrices each time and we repeated the 
experiment with various levels of noise to determine robustness. Results are 
reported in Figure]^ Again the experiment reveals that OSG outperforms all 
other methods in a majority of cases. 


10. Video Scene Segmentation 

The aim of this experiment is to segment individual scenes, which correspond 
to subspaces, from a video sequence. The video sequences are drawn from two 
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(a) Mean SCE (b) Median SCE (c) Minimum SCE (d) Maximum SCE 

Figure 7: Results for the video scene segmentation experiment (Video 1) with 
various magnitudes of Gaussian noise. OSC (ours) outperforms SpatSC, LRR 
and SSC in the majority of cases. 
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PSNR PSNR PSNR PSNR 


(a) Mean SCE (b) Median SCE (c) Minimum SCE (d) Maximum SCE 

Figure 8: Results for the video scene segmentation experiment (Video 2) with 
various magnitudes of Gaussian noise. OSG (ours) outperforms SpatSG, LRR 
and SSC in the majority of cases. 


short animations freely available from the Internet Archiv^ See Figurej^for an 
example of a sequence to be segmented. The sequences are around 10 seconds in 
length (approximately 300 frames) containing three scenes each. There are 19 
and 24 sequences from videos 1 and 2 respectively. The scenes to be segmented 
can contain significant translation and morphing of objects within the scene and 
sometimes camera or perspective changes, which presents considerable difficulty. 
Scene changes were collected manually to form ground truth data. 

The pre-processing of each sequence consisted of converting colour video 
to grayscale and down sampling to a resolution of 129 x 96. Each frame in 
the sequence was vectorised to G and concatenated with consecutive 

frames to form X € Ri2384x300_ Each sequence was further corrupted with 
various magnitudes of gaussian noise, with the experiment being repeated 50 
times at every magnitude. 

Results can be found in Figures and Generally OSC outperforms other 
methods and the error rates are consistently low when compared to other meth¬ 
ods which greatly increase as the magnitude of the noise is increased. 


^ http://archive.org/ 
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Figure 9: Results for the human activity segmentation experiment with various 
magnitudes of Gaussian noise. OSC (ours) outperforms SpatSC, LRR and SSC 
in the majority of cases. 


11. Human Activity Segmentation 

The aim of this experiment is to segment activities in a sequence from the 
HDM05 Motion Capture Database [301 • This dataset consists of a sequence of 
around 60 joint/joint angle positions for each frame, which has been captured 
at 120 Hz. These positions were determined by optically tracking a number 
of reflective markers on an actor. From the 2D frames containing the marker 
positions software is used to locate these points in 3D space. Then these points 
are transferred into joint and joint angle positions since this format requires less 
storage space. For an example of the capture environment and captured marker 
positions and skeletal structure please see Figure 

Unfortunately there is no provided frame by frame ground truth for this 
dataset. Therefore our ground truth has been assembled by watching the replay 
of the activities and hand labelling the activities using the activity list provided 
by go]. For this experiment we chose scene 1-1, which consists of mostly walking 
activities in various poses but also includes other actions such as double step¬ 
ping or shuffling sideways and multiple turns. This scene contains 9842 frames 
therefore to ease computational burden we divided the scene into four sections 
of between 2000 — 3000 frames each. 

We report subspace clustering error for this experiment in Figure]^ Similar 
to the previous experiments we add increasing amounts of Gaussian noise to 
determine the robustness of each method and repeat the experiment 50 times 
for each magnitude of noise. 

12. Conclusion 

We have presented and evaluated a novel subspace clustering method. Or¬ 
dered Subspace Glustering, that exploits the ordered nature of data. OSC pro¬ 
duces more interpretable and accurate affinity matrices than other methods. We 
showed that this method generally outperforms existing state of the art meth¬ 
ods in quantitative accuracy, particularly when the data is heavily corrupted 
with noise. Furthermore we have provided new optimisation schemes for OSC, 
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which have guaranteed convergence, lower computational requirements and can 
be computed in parallel. 
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